Distributed Set Expression Cardinality Estimation
نویسندگان
چکیده
We consider the problem of estimating set-expression cardinality in a distributed streaming environment where rapid update streams originating at remote sites are continually transmitted to a central processing system. At the core of our algorithmic solutions for answering set-expression cardinality queries are two novel techniques for lowering data communication costs without sacrificing answer precision. Our first technique exploits global knowledge of the distribution of certain frequently occurring stream elements to significantly reduce the transmission of element state information to the central site. Our second technical contribution involves a novel way of capturing the semantics of the input set expression in a boolean logic formula, and using models (of the formula) to determine whether an element state change at a remote site can affect the set expression result. Results of our experimental study with real-life as well as synthetic data sets indicate that our distributed set-expression cardinality estimation algorithms achieve substantial reductions in message traffic compared to naive approaches that provide the same accuracy guarantees.
منابع مشابه
A Simple and Efficient Estimation Method for Stream Expression Cardinalities
Estimating the cardinality (i.e. number of distinct elements) of an arbitrary set expression defined over multiple distributed streams is one of the most fundamental queries of interest. Earlier methods based on probabilistic sketches have focused mostly on the sketching algorithms. However, the estimators do not fully utilize the information in the sketches and thus are not statistically effic...
متن کاملExact Cardinality Query Optimization for Optimizer Testing
The accuracy of cardinality estimates is crucial for obtaining a good query execution plan. Today‟s optimizers make several simplifying assumptions during cardinality estimation that can lead to large errors and hence poor plans. In a scenario such as query optimizer testing it is very desirable to obtain the “best” plan, i.e., the plan produced when the cardinality of each relevant expression ...
متن کاملAn Efficient Distributed Compressed Sensing Algorithm for Decentralized Sensor Network
We consider the joint sparsity Model 1 (JSM-1) in a decentralized scenario, where a number of sensors are connected through a network and there is no fusion center. A novel algorithm, named distributed compact sensing matrix pursuit (DCSMP), is proposed to exploit the computational and communication capabilities of the sensor nodes. In contrast to the conventional distributed compressed sensing...
متن کاملMaximum Likelihood Method for RFID Tag Set Cardinality Estimation using Multiple Independent Reader Sessions
In this paper, Radio Frequency IDentification (RFID) tag set cardinality estimation problem is considered under the model of multiple independent reader sessions with unreliable radio communication links in which transmission errors might occur. After the R-th reader session, the number of tags detected in j (j = 1, 2, ..., R) reader sessions is updated, which we call observed evidence. Then, i...
متن کاملUniform-in-Bandwidth Nearest-Neighbor Density Estimation
We are concerned with the nonparametric estimation of the density f(·) of a random variable [rv] X ∈ R by the nearest-neighbor [NN] method. The NN estimators are motivated as follows (see, e.g., Fix and Hodges [17]). Let X1,X2, . . . be independent and identically distributed [iid] random copies of X, with distribution function [df] F(x) := P(X ≤ x), for x ∈ R. Denote the empirical df based upo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004